13 research outputs found

    Decentralized Exploration in Multi-Armed Bandits

    Full text link
    We consider the decentralized exploration problem: a set of players collaborate to identify the best arm by asynchronously interacting with the same stochastic environment. The objective is to insure privacy in the best arm identification problem between asynchronous, collaborative, and thrifty players. In the context of a digital service, we advocate that this decentralized approach allows a good balance between the interests of users and those of service providers: the providers optimize their services, while protecting the privacy of the users and saving resources. We define the privacy level as the amount of information an adversary could infer by intercepting the messages concerning a single user. We provide a generic algorithm Decentralized Elimination, which uses any best arm identification algorithm as a subroutine. We prove that this algorithm insures privacy, with a low communication cost, and that in comparison to the lower bound of the best arm identification problem, its sample complexity suffers from a penalty depending on the inverse of the probability of the most frequent players. Then, thanks to the genericity of the approach, we extend the proposed algorithm to the non-stationary bandits. Finally, experiments illustrate and complete the analysis

    Identification of Spectral Modifications Occurring during Reprogramming of Somatic Cells

    Get PDF
    Recent technological advances in cell reprogramming by generation of induced pluripotent stem cells (iPSC) offer major perspectives in disease modelling and future hopes for providing novel stem cells sources in regenerative medicine. However, research on iPSC still requires refining the criteria of the pluripotency stage of these cells and exploration of their equivalent functionality to human embryonic stem cells (ESC). We report here on the use of infrared microspectroscopy to follow the spectral modification of somatic cells during the reprogramming process. We show that induced pluripotent stem cells (iPSC) adopt a chemical composition leading to a spectral signature indistinguishable from that of embryonic stem cells (ESC) and entirely different from that of the original somatic cells. Similarly, this technique allows a distinction to be made between partially and fully reprogrammed cells. We conclude that infrared microspectroscopy signature is a novel methodology to evaluate induced pluripotency and can be added to the tests currently used for this purpose

    The Non-stationary Stochastic Multi-armed Bandit Problem

    Get PDF
    International audienceWe consider a variant of the stochastic multi-armed bandit with K arms where the rewards are not assumed to be identically distributed, but are generated by a non-stationary stochastic process. We first study the unique best arm setting when there exists one unique best arm. Second, we study the general switching best arm setting when a best arm switches at some unknown steps. For both settings, we target problem-dependent bounds, instead of the more conservative problem-free bounds. We consider two classical problems: (1) identify a best arm with high probability (best arm identification), for which the performance measure by the sample complexity (number of samples before finding a near-optimal arm). To this end, we naturally extend the definition of sample complexity so that it makes sense in the switching best arm setting, which may be of independent interest. (2) Achieve the smallest cumulative regret (regret minimization) where the regret is measured with respect to the strategy pulling an arm with the best instantaneous mean at each step

    Random Shuffling and Resets for the Non-stationary Stochastic Bandit Problem

    Get PDF
    We consider a non-stationary formulation of the stochastic multi-armed bandit where the rewards are no longer assumed to be identically distributed. For the best-arm identification task, we introduce a version of SUCCESSIVE ELIMINATION based on random shuffling of the K arms. We prove that under a novel and mild assumption on the mean gap ∆, this simple but powerful modification achieves the same guarantees in term of sample complexity and cumulative regret than its original version, but in a much wider class of problems, as it is not anymore constrained to stationary distributions. We also show that the original SUCCESSIVE ELIMINATION fails to have controlled regret in this more general scenario, thus showing the benefit of shuffling. We then remove our mild assumption and adapt the algorithm to the best-arm identification task with switching arms. We adapt the definition of the sample complexity for that case and prove that, against an optimal policy with N − 1 switches of the optimal arm, this new algorithm achieves an expected sample complexity of O(∆^{−2} sqrt(N Kdelta^{−1} log(K/delta)), where ή is the probability of failure of the algorithm, and an expected cumulative regret of O(∆^{−1} sqrt(N T K log(T K))) after T time steps

    Random Forest for the Contextual Bandit Problem

    No full text
    Abstract To address the contextual bandit problem, we propose an online random forest algorithm. The analysis of the proposed algorithm is based on the sample complexity needed to find the optimal decision stump. Then, the decision stumps are recursively stacked in a random collection of decision trees, BANDIT FOREST. We show that the proposed algorithm is optimal up to logarithmic factors. The dependence of the sample complexity upon the number of contextual variables is logarithmic. The computational cost of the proposed algorithm with respect to the time horizon is linear. These analytical results allow the proposed algorithm to be efficient in real applications, where the number of events to process is huge, and where we expect that some contextual variables, chosen from a large set, have potentially non-linear dependencies with the rewards. In the experiments done to illustrate the theoretical analysis, BANDIT FOREST obtain promising results in comparison with state-of-the-art algorithms

    Node-based optimization of LoRa transmissions with Multi-Armed Bandit algorithms

    No full text
    International audienceThe use of Low Power Wide Area Networks (LPWANs) is growing due to their advantages in terms of low cost, energy efficiency and range. Although LPWANs attract the interest of industry and network operators, it faces certain constraints related to energy consumption, network coverage and quality of service. In this paper we demonstrate the possibility to optimize the performance of the LoRaWAN (Long Range Wide Area Network) technology, one of the most widely used LPWAN technology. We suggest that nodes use light-weight learning methods, namely, multi-armed bandit algorithms, to select the communication parameters (spreading factor and emission power). Extensive simulations show that such learning methods allow to manage the trade-off between energy consumption and packet loss much better than an Adaptive Data Rate (ADR) algorithm adapting spreading factors and transmission powers on the basis of Signal to Interference and Noise Ratio (SINR) values
    corecore